Compact Recognizers of Episode Sequences
نویسندگان
چکیده
Mikhail J. Atallah t Purdue University Given two strings T = at ... an and P = hI .. .hm over an alphabet E, the problem of testing whether P occurs as a subsequence of T is trivially solved in linear time. It is also known that a simple D(nlog lEI) time preprocessing ofT makes it easy to decide subsequently for any P and in at most IPJIog lEI character comparisons, whether P is a subsequence of T. These problems become more complicated if onc asks instead whether P occurs as a subsequence of some substring Y of T of bounded length. This paper presents an automaton built on the textstring T and capable of identifying all distinct minimal substrings Y of X having P as a subsequence. By a substring Y being minimal with respect to P, it is meant that P is not a subsequence of any proper substring of Y. For every minimal substring Y, the automaton recognizes the occurrence of P having lexicographically smallest sequence of symbol positions in Y. It is not difficult to realize such an automaton in time and space 0(n2 ) for a text of n characters. One result of this paper consists of bringing those bounds down to linear or O(nlogn), respectively, depending on whether the alphabet is bounded or of arbitrary size, thereby matching the respective complexities of off-line exact string searching. Having built the automaton, the search for all lexicographically earliest occurrences of P in X is carried out in time O(n + kl rocc, . i . log n . log I~I), where rocc, is the number of distinct minimal substrings of T having b1 ... b; as a subsequence. All log factors appearing in the above bounds can be further reduced to log log by resort to known integer-handling data structures. Index Terms Algorithms, pattern matching, subsequence and episode searching, DAWG, suffix automaton, compact subsequence automaton, skip-edge DAWG, forward failure function,
منابع مشابه
Discerning Structure from Freeform Sketches
ion The MultiTree communication protocol abstracts recognizers from one another and from the application. Incrementality Recognition can occur incrementally as new strokes are added to the page with minimal computation, and this is built into the architecture. Efficiency The MultiTree is compact and supports data caching on each of the nodes to allow recognizers to store precomputed values for ...
متن کاملON GENERAL FUZZY RECOGNIZERS
In this paper, we de ne the concepts of general fuzzy recognizer, language recognized by a general fuzzy recognizer, the accessible and the coac- cessible parts of a general fuzzy recognizer and the reversal of a general fuzzy recognizer. Then we obtain the relationships between them and construct a topology and some hypergroups on a general fuzzy recognizer.
متن کاملA method and browser for cross-referenced video summaries
We present an automatic tool for compact representation and cross-referencing of long video sequences, which is based on a novel visual abstraction of semantic content. Our highly compact hierarchical representation results from the non-temporal clustering of scene segments into a new conceptual form grounded in the recognition of real-world backgrounds. We represent shots and scenes using mosa...
متن کاملGeneration of Handwritten Characters with Bayesian network based On-line Handwriting Recognizers
In this paper, we propose a new character generation method from on-line handwriting recognizers based on Bayesian networks. On-line handwriting recognizers are trained with handwriting samples from many writers. Then, character shapes are generated from given texts by searching the most probable input point sequences. Since Bayesian network based classifiers have large number of parameters for...
متن کاملGenerating Artificial Corpora for Plan Recognition
Corpora for training plan recognizers are scarce and difficult to gather from humans. However, corpora could be a boon to plan recognition research, providing a platform to train and test individual recognizers, as well as allow different recognizers to be compared. We present a novel method for generating artificial corpora for plan recognition. The method uses a modified AI planner and Monte-...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Inf. Comput.
دوره 174 شماره
صفحات -
تاریخ انتشار 2002